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ABSTRACT 



Systems and method for organizing information relating to 
the design of polymer probe array chips including oligo- 
nucleotide array chips. A database model is provided which 
organizes information interrelating probes on a chip, 
genomic items investigated by the chip, and sequence infor- 
mation relating to the design of the chip. The model is 
readily translatable into database languages such as SQL. 
The database model scales to permit storage of information 
about large numbers of chips having complex designs. 
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METHOD AND SYSTEM FOR PROVIDING A As can be seen, the probe array chips are designed to 

PROBE ARRAY CHIP DESIGN DATABASE answer questions about genomic items, herein defined to 

include genes, expressed sequence tags (ESTs), gene 
CROSS-REFERENCE TO RELATED clusters, and EST clusters. Associated with information 
APPLICATIONS 5 a ^ )0ut genonnc items is genetic sequence information con- 
Hie present application claims priority from U.S. Prov. ceming the base sequences of genomic items. Probes are 
App. No. 60/053,842 filed Jul. 25, 1997, entitled COMPRE- designed and selected for inclusion on a chip based on: 1) 
HENSIVE BIO-INFORMATICS DATABASE, from U.S. the identity of the genomic items to be investigated by the 
Prov. App. No. 60/069,198 filed on Dec. 11, 1997, entitled chip, 2) the sequence information associated with those 
COMPREHENSIVE DATABASE FOR io genomic information, and 3) the type of information sought, 
BIOINFORMATICS, and from U.S. Prov. App. No. 60/069, e.g., expression analysis, polymorphism analysis, etc. The 
436, entitled GENE EXPRESSION AND EVALUATION interrelationships, however, among probes, genomic items, 
SYSTEM, filed on Dec. 11, 1997. The contents of all three and sequence information are, however, extremely complex, 
provisional applications are herein incorporated by refer- greatly complicating the tasks of designing chips, effectively 
ence. 15 exploiting chips that have already been designed, and eflB- 
The subject matter of the present application is related to ciently interpreting the information generated by application 
the subject matter of the following three co-assigned appli- of the chips. 

cations filed on the same day as the- present application: Moreover, it is contemplated that the operations of chip 

GENE EXPRESSION AND EVALUATION SYSTEM, design> construction, and application will occur on a very 

METHOD AND APPARATUS FOR PROVIDING A BIO- 20 i arge scale. The quantity of information related to chip 

INFORMATICS DATABASE, METHOD AND SYSTEM desigD to store ^ correlate is vast. What is needed is a 

FOR PROVIDING A POLYMORPHISM DATABASE. The system and me thod suitable for storing and organizing large 

contents of these three applications are herein incorporated quantities of information used in conjunction with the design 

by reference. of probe array chips. 

BACKGROUND OF THE INVENTION 25 m _ 

„ . J SUMMARY OF THE INVENTION 

The present invention relates to the collection and storage 

of information pertaining to chips for processing samples. The present invention provides systems and method for 

Devices and computer systems for forming and using organizing information relating to the design of polymer 

arrays of materials on a substrate are known. For example, 30 probe array chips including oligonucleotide array chips. A 

PCT application WO92/10588, incorporated herein by ref- database model is provided which organizes information 

erence for all purposes, describes techniques for sequencing interrelating probes on a chip, genomic items investigated 

or sequence checking nucleic acids and other materials. by the chip, and sequence information relating to the design 

Arrays for performing these operations may be formed in of the chip. The model is readily translatable into database 

arrays according to the methods of, for example, the pio- 35 languages such as SQL. The database model scales to permit 

neering techniques disclosed in U.S. Pat. No. 5,143,854 and storage of information about large numbers of chips having 

U.S. Pat. No. 5,571,639, both incorporated herein by refer- complex designs. 

ence for all purposes. According to one aspect of the present invention, a 

According to one aspect of the techniques described computer-readable storage medium is provided . A relati onal 

therein, an array of nucleic acid probes is fabricated at 40 database is stored on this medium. The relational datatrase 

known locations on a chip or substrate. A fluorescendy Includes! a probe table including a plurality of probe records, 

labeled nucleic acid is then brought into contact with the each of the probe records specifying a polymer probe for use 

chip and a scanner generates an image file indicating the fa one or more polymer probe arrays, a sequence item table 

locations where the labeled nucleic acids bound to the chip. including a plurality of sequence item records, each of the 

Based upon the identities of the probes at these locations, it 45 sequence item records specifying a nucleotide sequence to 

becomes possible to extract information such as the mono- De investigated in the one or more polymer probe arrays, 

mer sequence of DNA or RNA. Such systems have been wherein there is a many-to-many relationship between the 

used to form, for example, arrays of DNA that may be used probe records and the sequence item records, 

to study and detect mutations relevant to cystic fibrosis, the A understandiQg of the nature md advantages of 

P53 gene (relevant to certain cancers), HIV, and other 50 ^ mventions hereill may be realized by reference to the 

genetic characteristics. remaining portions of the specification and the attached 

Computer-aided techniques for monitoring gene expres- drawings, 
sion using such arrays of probes have also been developed 

as disclosed in U.S. patent application Ser. No. 08/828,952 BRIEF DESCRIPTION OF THE DRAWINGS 

and PCT publication No. WO 97/10365, the contents of 55 

which are herein incorporated by reference. Many disease FIG 1 illustrates an overall system and process for 

states are characterized by differences in the expression forming and analyzing arrays of biological materials such as 

levels of various genes either through changes in the copy DNA or RNA. 

number of the genetic DNA or through changes in levels of FIG 2A illustrates a computer system suitable for use in 
transcription (e.g., through control of initiation, provision of 60 conjunction with the overall system of FIG. 1. 

RNA precursors, RNA processing, etc.) of particular genes. pjQ. 2B illustrates a computer network suitable for use in 

For example, losses and gains of genetic material play an conjunction with the overal i system of FIG. 1. 

important role in malignant transformation and progression. . 

Furthermore, changes to the expression (transcription) levels FIG- * Urates a key for interpreting a database model. 

of particular genes (e.g., oncogenes or tumor suppressors), 65 FIG 4 illustrates a database model for maintaining infor- 
serve as signposts for the presence and progression of mation for the system and process of FIG. 1 according to one 

various cancers. embodiment of the present invention. 
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DESCRIPTION OF SPECIFIC EMBODIMENTS appropriate times for deprotection of selected regions of the 

_. , . , w . . , . , ■ c . chip. Selected chemical, reagents are directed through the 

Biological Material Analyse System flo W ceU for coupUng to deprotected regions, as well as for 

One embodiment of the present invention operates in the washing other operations. The substrates fabricated by 

context of a system for analyzing biological or other mate- $ ^ synthesis system are opdonal iy diced into smaller chips, 

rials using arrays that themselves include probes that ^maybe ^ , of me thcsis system ^ a chip ready for 

made of biological materials such as RNA or DNA. The of a , t s le . 

VLSIPS™ and GeneChip™ technologies provide methods v \ . " , . 

of making and using ver? large arrays of polymers, such as Information about the mask des.gn mask construcuon, 

nucleic adds, on chips. See U.S. Pat. No. 5,143,854 and ,„ P«** arraj r synthesis and analyse systems is presented by 

PCT Patent Publication Nos. WO 90/15070 and 92/10092, 10 of background. A biological source 112 is, forexample, 

each of which is hereby incorporated by reference for all tw« from a plan or ammal. Various processing steps are 

purposes. Nucleic acid probes on the chip are used to detect W*"*? &° m Sloped *™* m . b / a ^ 

y F . A i - j . i - preparation system 114. These steps may include e.g., iso- 

complementary nucleic acid sequences in a sample nucleic F 1 r flU . Z* TA . 4 . u DMA t • ' „ 

■-if. ♦ «* a ~,n Nation of mRNA, precipitation of the MRNA to increase 

acd of m erest (the ta^et nucleic aC1 d> concentration, etc/synthesis of cDNA from MRNA, PCR 

It sbou d be understood that the probes need no. be lification of fr 4 ments o£ ^ result of the 

nucleic acid probes but may also be other polymers such as ^ ^ ^ a rea for a , ication t0 

peptides Peptide probes may be used to detect the concen- * J* ^ ^ ^ ^ 

tration of peptides, polypeptides, or polymers in a sample. m . , . , . , t - A 

The probes must be carefully selected to have bonding The prepared samples include monomer nucleotide 

affinity to the compound whose concentration they are to be sequences such as RNA or DNA When the sample is 

used to measure applied to the chip by a sample exposure system 116, the 

FIG. 1 illustrates an overall system 100 for forming and nucleotides may or may not bond to the V**™-™* ; nucle- 

analyzing arrays of biological materials such as RNA or "TfS S 
DNA. A part of system 100 is a chip design database 102. 25 w " lch P^es have bonded to nuc eotide sequences from the 

Chip deign database 102 includes information about chip " P"P ared samples wiU be placed in ^a scanning 

designsandthepurposesofchi P s.Chipdesigndatabasel02 Scanmng system 118 mcludes adaecUon 

" , i j • 4 4 . , _ • r device such as a confocal microscope or CCD (charge- 

facduatcs large scale destgn, construction, and processing of ^ ^ ^ ^ ^ ^ ^ 

c f s ",. , . . Aj4 . j -I • ( labeled receptors have bound to the substrate. The output of 

A chip design system 104 is used to design arrays of 30 scannin ^ U8 fa ^ ^ ae(s) indicating) in the 

polymers such as b.o ogical polymers such as RNAor DNA. fluorescein labeled receptor, the fluorescence inten- 

Chip design system 104 may be, for example an appropn- q ^ omer related measurementS; 

ately programmed Sun Workstation or personal computer or a of ition on the ^,^6. These 

workstation, such as an IBM PC equivalent, .Deluding ^ form a ^ rf databaseTJH. 

appropnate memory and a CPU. Chip design system 104 3S ggg&jU hoton ^ te observed where the 

obtains inputs from a user regarding ctap des,gn objectives *^ ^ has ^ mofe tQ ^ of 

mcludmg characteristics of genes of interest, and other ^ ^ monomer ce rf me , 

inputs regarding the desired features of the array All of this ^ ^ fe ^ a of ition< k 

mformation may be stored m chip design database 102. ^ (() ^ (s) of , 

Optionally, cmp design system 104 may obtain information w m / substrate mat are complementary to the receptor, 

regarding a specific genetic sequence ot interest trom chip v/ 

design database 102 or from external databases such as The image files and the design of the chips are input to an 

GenBank. The output of chip design system 104 is a set of svstem 120 ^\ ^ base sequences, or 

chip design computer files in the form of, for example, a determines expression levels of genes or expressed sequence 

switch matrix, as described in PCT application WO 45 The expression level of a gene or EST is herem 

92/10092, and other associated computer files. The chip understood to be the concentration within a sample of 

design computer files form apart of chip design database MRNA or protein that would result from the transcription of 

102 Systems for designing chips for sequence determine the gene or EST. Such analysis techmques are disclosed in 

tion and expression analysis are disclosed in U.S. Pat. No. WO97/10365, the contents of which are herein ^rporated 

5,571,639 and in PCT application WO 97/10365, the con- 50 by reference. Base calling techmques are described in WO 

tents of which are herein incorporated by reference. 95/11995, the contents of which are herem incorporated by 

The chip design files are input to a mask design system reference, 
(not shown) that designs the lithographic masks used in the Chip design system 104, analysis system 120 and control 
fabrication of arrays of molecules such as DNA. The mask portions of exposure system 116, sample preparation system 
design system designs the lithographic masks used in the 55 U4, ^ scanning system 118 may be appropriately pro- 
fabrication of probe arrays. The mask design system gener- grammed computers such as a Sun workstation or IBM- 
ates mask design files that are then used by a mask con- compatible PC. An independent computer for each system 
struction system (not shown) to construct masks or other may perform the computer-implemented functions of these 
synthesis patterns such as chrome-on-glass masks for use in systems or one computer may combine the computerized 
the fabrication of polymer arrays. 60 factions of two or more systems. One or more computers 

The masks are used in a synthesis system (not shown). may maintain chip design database 102 1 independent of the 

The synthesis system includes the necessary hardware and computers operating the systems of FIG. 1 or chip design 

software used to fabricate arrays of polymers on a substrate database 102 may be fully or partially maintained by these 

or chip. The synthesis system includes a light source and a computers. 

chemical flow cell on which the substrate or chip is placed. 65 FIG. 2A depicts a block diagram of a host computer 
A mask is placed between the light source and the substrate/ system 10 suitable for implementing the present invention, 
chip, and the two are translated relative to each other at Host computer system 210 includes a bus 212 which inter- 
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connects major subsystems such as a central processor 214, the key attribute of a parent table 316 is also a non-key 

a system memory 216 (typically RAM), an input/output attribute of a child table 318. Where (FK) appears in 

(I/O) adapter 218, an external device such as a display parenthesis, it indicates that an attribute of one table is a key 

screen 224 via a display adapter 226, a keyboard 232 and a attribute of another table. For both the non-identifying and 

mouse 234 via an I/O adapter 218, a SCSI host adapter 236, 5 the identifying relationship, one record in the parent table 

and a floppy disk drive 238 operative to receive a floppy disk corresponds to one or more records in the child table. 

240. SCSI host adapter 236 may act as a storage interface to At the highest level, chip design database 102 may be 

a fixed disk drive 242 or a CD-ROM player 244 operative to understood as providing a relational structure among 

receive a CD-ROM 246. Fixed disk 244 may be a part of genomic items, sequence items, and tiling items, as these 

host computer system 210 or may be separate and accessed 10 terms are defined herein by use of example. Genes are 

through other interface systems. A network interface 248 characterized by their sequence, location on the genome, and 

may provide a direct connection to a remote server via a function. Genomic items are herein defined as references to 

telephone link or to the Internet. Network interface 248 may genes, gene clusters, expressed sequence tags (ESTs), and 

also connect to a local area network (LAN) or other network EST clusters by location and/or function but not by 

interconnecting many computer systems. Many other 15 sequence. Sequence items are herein defined to be any 

devices or subsystems (not shown) may be connected in a oligonucleotide sequence or group of oligonucleotide 

similar manner. sequences that may or may not by itself have biological 

Also, it is not necessary for all of the devices shown in meaning. A sequence item may be a long sequence of 
FIG. 2A to be present to practice the present invention, as genomic DNA including more than one exon of biological 
discussed below. The devices and subsystems may be inter- 20 significance. Alternatively, an exon may include many 
connected in different ways from that shown in FIG. 2A. The sequence items. Also, a genomic item may have multiple 
operation of a computer system such as that shown in FIG. associated sequence items or groups of sequence items 
2A is readily known in the art and is not discussed in detail because of changes of sequence information stored in public 
in this application. Code to implement the present invention, genomic databases. Genomic items and sequence items are 
may be operably disposed or stored in computer-readable 25 tracked separately by database 102. There is a many-to- 
storage media such as system memory 216, fixed disk 242, many relationship between genomic items and sequence 
CD-ROM 246, or floppy disk 240. items which is captured by the internal structure of chip 

FIG. 2B depicts a network 260 interconnecting multiple desi S Q database 102. 

computer systems 210. Network 260 may be a local area Tiling items represent groupings of probes on a chip. A 

network (LAN), wide area network (WAN), etc. Bioinfor- 30 tiling item may be a pair of group of pairs of match and 

matics database 102 and the computer-related operations of mismatch probes for an expression analysis chip. For 

the other elements of FIG. 2B may be divided amongst sequencing chips, a tiling item may be an atom including a 

computer systems 210 in any way with network 260 being group of probes designed to detect a mutation or call a base 

used to communicate information among the various com- at a particular base position. Tiling items are designed to 

puters. Portable storage media such as floppy disks may be 35 interrogate sequence items, e.g., determine expression or 

used to carry information between computers instead of call bases. However, a single tiling item may be used to 

network 260. interrogate more than one sequence item. For example, 

Overall Description of Database consider that a sequence item may identify a group of 

. , . , 4 . 1(M . fui w 1 ^ ♦ sequences or a single sequence that is longer than the length 

Chip design database 102 is preferably a relational data- 40 r n i ^ • a-oz u M ™ J» 

t 6 . . „ 1 * * tl t j of a probe. Conversely, certain difficult sequences, e.g., 
base mth a complex mternal structure The Rehire and ng i ongnmso ftte same base, may require 
contents of chip design database 102 w,U be described with ^ ^ oQe * fof mt tion . xhere fe thus a 
reference to a logical model that describes toe contente of t0 reU ttonsbip between sequence item and til- 
tabes of the database as well as ^interrelationships among the y * ^ ^ c ^ 
tables. A visual depiction of this model wiU be an Entity 45 * 

Relationship Diagram (ERD) which includes entities, 6 v ^ . 

relationships, and attributes. Adetailed discussion of ERDs Tiling items include probe pair sets. A probe pair set 

is found in "ERwin version 3.0 Methods Guide" available represents a single sequence on a ch,p and include probe 

from Logic Works, Inc. of Princeton, NX, the contents of P^- Chip design database 102 thus enables one to follow 

which are herein incorporated by reference. Those of skill in 50 ** various interrelationships described above and, e.g 

the art will appreciate that automated tools such as Devel- »^ ate . a particular probe on a chip with the associated 

oper 2000 available from Oracle will convert the ERD from P robe P»* P robe P alr . ,llm S ltem >. sequence item, 

FIG. 4 directly into executable code such as SQL code for S enomic ltetn > elc ; The associated genomic item may be a 

creating and operating the database. S ene clus ' er "spcwta! with a particular gene and : «n acces- 

„ . , , , . ... , ... ., , r sion number within some biological database. All ot these 

FIG. 3 is a key to the ERD that wdl be used to describe 55 relationships are preferably captured within 

the contents of chip design da abase 102. A representative ^ ^ esi ^ database 102 

table 302 includes one or more key attributes 304 and one or A, . . . ... , r ,, ■ , . f 

more non-key attributes 306. Representative table 302 ^P des p d f ba f 102 ^ Preferably includes infor- 

includes one or more records where each record includes ma fn ™* j» ?* ^ cont « ncd f wthm . ^ Pf" 

fields corresponding to the listed attributes. The contents of 60 *«hr chip design. There also may be information abou 

the key fields taken together identify an individual record. In customer orders for a particular chip design including what 

the ERD, each table is represented by a rectangle divided by "f™?? we ' e «° be tested by a particular chip design, who 

a horizontal line. The fields or attributes above the line are ordered me chl P ^sign, etc. 

key while the fields or attributes below the line are non-key. Applications of Chip Design Database 

An identifying relationship 308 signifies that the key 65 Chip design database 102 is a highly useful tool in 

attribute of a parent table 310 is also a key attribute of a child designing and tracking existing chip designs. One applica- 

table 312. A non-identifying relationship 314 signifies that tion is storing intermediate data about genomic items, 
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sequence items, etc. that is input or generated during the record in a probe specification table 412. The probe speci- 

course of generating a chip design. Scientists may request fication record tells the length of the probe and the orien- 

that particular genes or sequences be investigated. An inter- tation of the probe. The orientation of the probe (sense or 

mediate step in determining the chip design will be popu- antisense) is identified within the probe specification record 

lating chip design database 102 with the information iden- 5 b Y reference to a record in a sense type table 414 which lists 

tifying genes or sequences to be investigated. Since chip both orientations 

design database 102 preserves the information about the A chip design Uble 416 lists chip designs. Associated with 

genomic items that are investigated by a particular chip chip design is a plurality of tiling items in tiling item 

design, it is also very useful in finding existing chip designs Jbk 402. Also associated with each chip design is a chip 

mat are capable of servicing new requests. Also, chip design 10 design type as listed m a chip design type table 418. 

, v & h m > i*T Examples of chip design types are "expression analysis or 

database 102 may be used after chip design is complete to « mut P fon detecfion" Each chip design may have many 

answer questions about which genomic items and/or associated chi desigD names listed ^ a chip design name 

sequence items are interrogated by a particular probe or ^ 42Q names m&y indude informal names ^ d 

tiling item. within the organization or formal names used in formal 

Database Model 15 interorganization communications. 

FIG. 4 is an entity relationship diagram (ERD) showing ^ p designs ma y be aggregated into chip design sets 

elements of chip design database 102 according to one which are listed in a chip composition table 422. Each record 

embodiment of the present invention. Each rectangle in the of caip composition table 422 identifies a chip design set 

diagram corresponds to a table in database 102. For each which may include more than one chip design listed in chip 

rectangle, the title of the table is listed above the rectangle. 20 design table 416. A chip design set may characterize a group 

Within each rectangle, columns of the table are listed. Above Q f ^d together for a particular purpose such as 

a horizontal line within each rectangle are listed key identifying expression of oncogenes or tumor suppressors in 

columns, columns whose contents are used to identify humans. 

individual records in the table. Below this horizontal line are ^ except i on table 424 lists sequences whose investiga- 

the names of non-key columns. The lines between the tion was requeste d but for which optimal probes were not 

rectangles identify the relationships between records of one mc i u ded in the design. Each exception is associated with a 

table and records of another table. First, the relationships part icular combination of sequence and tiling item and has 

among the various tables will be described. Then, the a n associated exception type listed in an exception type table 

contents of each table will be discussed in detail. ^ 426 Qne type of exception> referred to as an exception 

The tables of database 102 may be understood as belong- ^ noted when preferred rules for probe selection have not 

ing to different groups that relate to purpose. In FIG. 4, each been followed because they would not result in an adequate 

table is denoted with a capital letter "A" through "F* to setof probes in the chip design for a particular sequence. An 

denote membership in a group. Group A includes sequence «$» exception denotes that the sequence is very similar to 

and biological data. Group B includes design request infor- ^ another sequence and that sequences had to be grouped 

mation. Group C includes chip design information such as together to find acceptable probe sets so that certain probes 

which probes are included and how they are laid out. Group interrogate more than one sequence. An "\" exception indi- 

D includes design specification information including infor- cates tnat me pro be se t is incomplete, although the probes 

mation used in selecting probes. Group E includes informa- ma t are included in the set interrogating the sequence arc of 

tion about compliance to customer contracts for chip design mgn q Uaul y. A "B" exception indicates that all probe selec- 

and production. Group F includes information about uon m \ GS have been dropped and that the probes are of low 

sequences requested but not included in a final chip design quality. A "G" indicates that the sequence overlaps with 

because of difficulty in selecting probes that would be another sequence. 

effective in investigating them. ^ a sequence item table 426 that lists all the 

The interrelationships and general contents of the tables 45 sequence items of chip design database 102. Associated with 

of database 102 will be described first. Then a chart will be each listed sequence item is a sequence type from sequence 

presented listing and describing all of the fields of the type table 428. Examples of sequence type include 

various tables. "sequence" and "group of sequences/' A sequence compo- 

A tiling item table 402 lists the various tiling items. Each sition table 430 is used to aggregate sequences into groups 

record in tiling item table 402 identifies a tiling item for a 50 of sequences. Each group listed in sequence composition 

particular chip design. Each tiling item has an associated table 430 has associated sequences in sequence item table 

tiling item type listed in a tiling item type table 406. 426. 

Examples of tiling item type include "probe pairs" which There is a sequence derivation table 432 which lists 

would identify a perfect match — mismatch probe pair or derivations used to transform one sequence listed in 

"atom" which would indicate a group of probes used for 5S sequence item table 426 into another. Each derivation has a 

determining a mutation or calling a base at a particular base derivation type listed in a derivation type table 434. 

position. Each tiling item has one or more associated probes Examples of derivation types include "removal of 

which are listed in a probe table 408. ambiguities," or "change in GenBank information." An 

A tiling item may itself be an aggregation of other tiling allele table 436 lists polymorphisms for some of the 

items. A tiling composition table 409 includes records that 60 sequences listed in sequence item table 432. 

associate aggregate tiling items with the tiling items they A sequence overlap table 438 lists overlaps between 

include. sequences of sequence item table 426. These overlaps are 

Associated with each probe listed in probe table 408 is a important to know for the probe selection process. The 
probe role record in a probe role table 410. The probe role overlaps are determined by a process known as blast corn- 
record tells, e.g., if a particular probe in a perfect match 65 parison. The result of a blast comparison is a description of 
mismatch pair is itself the perfect match or the mismatch. the match quality between the compared sequences. This 
Further associated with each probe is a probe specification match quality is stored in sequence overlap table 438. 
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During the chip design process, sequences may be the objective. For example one project may investigate genes 

basis for creating tiling items. Sequence information is also relating to high blood pressure while another project inves- 

the basis for pruning the set of probes that are included in a tigates genes relating to breast cancer. Typically, a project 

chip design. Pruning is a step of probe selection. Objectives will be the impetus for designing a chip or a set of chips. A 

of pruning may include: assuring that no probe is a duplicate 5 project table 476 lists such projects. A project map table 478 

of another probe in a probe pair set, assuring that no probe lists associations between projects and genomic items and 

is the same as any other probe in a chip or set of chips, or like the other map tables implements a many-to-many 

assuring that a probe is not a duplicate of any probe that relationship between genomic items and projects, 

would be used to interrogate a set of sequences larger than The chip design process may originate with a project 

the set investigated by a chip or set of chips. For example, 1Q assignment which specifies genomic items, or may alterna- 

it may be useful once the entire human genome is known to tively originate with a design request that specifies 

prune probe sets so that no probe is used that would sequences to be interrogated by probes on the chip. A design 

interrogate more than one sequence in the genome. The request table 480 lists such design requests. Each design 

more that is pruned against, the higher the quality of the request may have many associated design request items 

resulting chip design is since ambiguity in analysis results is listed in a design request item table 482. The records of 

greatly reduced. To facilitate pruning, chip database 102 15 design request item table 482 each identify a requested 

provides a pruning set table 440 which lists pruning sets. sequence item. 

Each pruning set has an associated chip design in chip All requested sequences may or may not fit in the final 

design table 416. A pruning map table 442 lists correlations cn ip design. If a requested sequence is not found in a chip 

between particular sequence items and pruning sets and design, this is recorded in a reject map table 484. Each 

implements the many-to-many relation that exists between 20 reC ord in reject map table 484 identifies a sequence that was 

sequence item table 426 and pruning set table 440. requested to be included in a particular chip design but left 

A genomic item table 444 lists genomic items. Each listed out. Each such reject record has an associated reject type 

genomic item may be a gene or EST or an aggregate of genes selected from the types listed in a reject type table 486. 

or ESTs. A genomic composition table 446 lists the rela- ^ Associated with each design request or project is a 

tionships between aggregations of genes and/or ESTs and customer as listed in a customer table 488. Each customer 

their components. A genomic name table 448 fists names of mav have one or more associated design requests, 

genomes. Each name may apply to more than one genome. annotations, or projects as listed in tables 480, 468, and 476 

Similarly, each genome may have more than one name. A respectively. A customer may also be the source of one or 

genomic name map table 450 implements the many-to-many ^ more sequence items as found in a sequence item table 426. 

relationships between genomes and names. a source map table 490 implements the many-to-many 

A genomic type table 452 lists the various types of relationship between sequence items and customers. Each 

genome such as "gene," "gene cluster," "EST," and "EST customer is associated with a site as recorded in a site table 

cluster." Each genomic item in genomic item table 444 has 492. 

an associated genomic type in genomic type table 452. A 35 There may also be associations between design requests 

species table 454 lists the species associated with the anc j projects. Projects may have one or more associated 

genomic items. Each genomic item in genomic item table design requests and design requests may have one or more 

444 has an associated species in species table 454. associated projects. A design map table 493 lists associations 

It is often useful to know the position of a genomic item between design requests and projects, 

in a chromosome. A chromosome table 456 lists various ^ Companies may have one or more sites and are listed in 

chromosomes. Each record in a chromosome map table 458 a company table 494. Biological databases listed in biologi- 

indicates which chromosome a genomic item is located in ca i database table 462 may be proprietary to companies 

and where on the chromosome the genomic item would be listed in company table 494. By providing a relationship 

found. between these two tables, chip design database 102 allows 

It is also useful to store information about database 45 the chip designer to keep track of genomic item information 

references for genomic items. The records of biological that should be kept proprietary to particular orderers. Source 

database reference table 460 each include information as map table 490 similarly assists in maintaining the necessary 

would be found in one database about one genomic item. confidentiality for customer-originated sequence informa- 

The databases themselves are listed in a biological database tion. A company may request specific probes to be included 

table 462. Representative databases include GenBank, 50 in a chip. These requests are listed in a probe request table 

Entrez, and TIGR. 491. An order limits table 493 lists the contractual limita- 

Genomic items are themselves related to one another by tions that apply to chip design work to be done for particular 

functional homology. Genomic items may be grouped by the companies. For example, a company may be limited to 

functions performed by proteins that result from their investigate a certain number of genes per chip, or be limited 

expression. A homology function table 464 lists different 55 to request a certain number of probes per chip, 

functions in a cell. A homology map table 466 lists asso- A communications table 496 lists communications 

ciations between the listed homologies and genomic items between the chip designer and customer about a particular 

listed in genomic item table 444. design request. Each design request may have one or more 

Genomic items listed in genomic item table 444 may also associated communications. Each communication listed in 

have associated annotation information. An annotation table so communications table 496 has an associated communica- 

468 lists annotations for genomic items. Each record in an tions type as listed in a communications type table 498. 

annotation map table 470 associates an annotation and a Different communication types may correspond to different 

genomic item. A comment found in an annotation may be stages in the process. For example, the different types may 

backed up by a citation to the literature listed in a citation include "chip request," "sequences updated," "sequences 

table 472. 65 incomplete," etc. 

Genomic items may be grouped into sets corresponding to A classification table 500 lists classifications of item 
projects where each project has a particular investigative requests. Such classifications represent functional hierar- 
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chies. Classifications may include, e.g., tissue types or 
protein family names. A classification map table 502 asso- 
ciates item requests with classifications. 

The many-to-many relationship between genomic items 
and sequence items is implemented by a sequence map table 5 
504 which lists associations between genomic items and 
sequence items. The many-to-many relationship between 
sequence items and tiling items and thus probes is imple- 



12 



mented by a sequence used map 506 which lists associations 
between sequence items and tiling items. A control map 
table 508 similarly implements a many-to-many relationship 
between sequence items and tiling types. 

Database Contents 

The contents of the tables introduced above will now be 
presented in greater detail in the following chart. 



1 E? 

TABLE 


FIELD 


CONTENTS 




CDfldChromosomeiD 


Identification number for chromosome. 




CD fldCh ro mosomc Name 


Name of chromosome. 


CDtbl Chro mo so m eMa 


GENOMIC_ItemD(FlC) 


Reference to genomic item in genomic 


P 




item table. 




CDfldChromosomeD(FK) 


Reference to chromosome table. 




CDfldChroMapCytogenicLocatio n 


Cytogenic location. 




CDfldChroMapGeneticLocation 


Genetic location. 




CDfldChroMapPhysical Location 


Physical location of genomic item on 






chromosome. 


nFMOMTP NAMF 


GENOMIC ID(IE1.1) 


Reference to genomic item table. 




GENOMIC Name 


Name of genome. 




CDfldGe no micName Long 


Longer version of genomic name. 




SPFCIFS ID 


Species identification. 




or jj,Vwin.c> iype 


Type of species. 




SPECIES CommonName 


Common name of species. 


K^U ID HJCnC It a III C Map 


GENOMIC ID(FK) 


Reference to genomic name table. 




(tFNOMTC rtemlfYFK^ 


Reference to genomic item tabic. 


CDtblHomologyMap 


kj LLrs KJiYi l demeiii^n r*.j 


Points to genomic item in genomic item 






tabic. 




GENCOMP AggregatelD 


Identifies aggregation of genomic items. 


GENOMIC TYPE 


GENOMICTYPE ID 


Identifier for genomic type. 




GENOMICTYPE Name 


Name of genomic type. 




CDfldgcnomictypcdcscriptio n 


Description of genomic type. 


GENOMIC ITEM 


GENOMIC ItemlD 


Genomic item identifier. 




SPECIES ID(FK) 


Reference to species table. 




GENOMIC ItemId(FK)(IEl.l) 


Reference to genomic type table. 


CDtblHomologyMap 


CDfldHomologylD(FK) 


Homology identifier. 




GENOMIC itemldfFK) 


Reference to genomic item table. 


CDtblHomologyFuncti 


CDadHomologylD 


Homology identifier. 


on 


CDfldHomoIogyName 


Name of homology. 




CDfldHomoIogyDescription 


Description of homology. 


BIOLOGICAL_DB_R 


BIODBEF_ID 


Identifier for biological database 


EFERENCE 




reference. 




OcjNUiviiL ltemLL^rjy) 


Reference to genomic item table. 






Reference to biological dababase table. 




BIODBREF \faluc(AKl.l) 


Reference value, e.g., accession 






number. 




BIODBREF Description 


Description of database reference. 


BIOLOGICAL DB 


BIODBREF ID 


Biological database identifier. 




COMPANY ID(FK) 


Reference to company table. 




BIODB Name 


Name of database. 




BIODB ReferenceType 


Type of reference. 




CDfldBioDBWebSite 


Website for database. 


ANNOTATION 


ANNOTATION ID 


Annotation identifier. 




ANNOTATION Description 


Description of annotation. 


ANNOTATION MAP 


ANNOTATION ID(FK) 


Reference to annotation table. 




GENOMIC ItemID(FK) 


Reference to genomic item table. 




CUSTOMER ID(FK) 


Reference to customer table 




CITATION ID(FK) 


Referenct to citation table 




ANNOTATIONMAP Ratng 


Indication of quality of annotation. 


CITATION 


CrTATION ID 


Citation identifier. 




CITATION Source 


Source of citation. 


SEQUENCE ITEM 


SEQUENCE ITEM 


Sequence identifier. 




SEQTYPE ID(FK) 


Reference to sequence type table. 




SEQUENCE Sequence 


Sequence (may be very long field). 


SEQUENCE MAP 


SEQUENCE ID(FK) 


Reference to sequence item table. 




GENOMIC ItemID(FK)(IEl.l) 


Reference to genomic item table. 


CDtblAllele 


CDfldAllelelD 


Allele identifier. 




SEQUENCE ID(FK) 


Reference to sequence item table. 




CDfldAHeleOffiset 


Position of polymorphism 




CDfldAUeleBase 


Base defined by polymorphism. 
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-continued 



TABLE 



FIELD 



E/198 
E/200 

SEQUENCE TYPE 



SEQUENCE 
DERIVATION 



CDtblDcrivation Type 



SEQUENCE 
OVERLAP 



SEQUENCE 
COMPOSITION 

PRUNING MAP 

PRUNING SET 

CHIP DESIGN 



CHIP DESIGN TYPE 



CDtblChipDesignNam 
e 

CHip_coMPOsrno 

N 

TILING ITEM 



TILING TYPE 



CONTROL MAP 



TILING_COMPOSin 
ON 



SEQUENCE ID(FK)(IE2.1) 
CHIP DesignID(FK)(El .1) 
REJECTTYPE ID(FK) 
REJECTTYPE ID 
REJECTTYPE Name 
REJECTTYPE Description 
SEQTYPE ED 
SEQTYPE Name 
CDfldseqtypedescription 
SEQUENCE ID(FK) 
SEQCOMP ElemcntlLXFX) 
CDfldDcriv eType I D (FK) 
CDfldSeq Derive Alias 

CDfldSeqDeriveOfiket 

CDfldDcriveTypeED 
CDfldDeriveName 
CDfldDeriveDescription 
String 

SEQUENCE ID (FK) 
SEQSEQOVERLAP ID2 
SEQOVERLAP_Match Percent 

SEQOVERLAP_MatchScqueace 

CDfldSeqOveriapOflfeet 



SEQCOMP_ElemerjtID(FK) 

SEQCOMP Aggregate [D 
PRUNINGSET ID(FK) 
SEQUENCE ID(FK) 
PRUNINGSET ID 
PRUNINGSET NAME 
PRUNINGSET Description 
CHIP DesignID 
COMPANY ID(FK) 
CHIP TypelD(FK) 
CHIP _FcatureSize 

CHIP_MaskID 

CHIP FeatureCountY 
CHIP PartNumber 
CHIP Code 
CHIP GridX 
CHIP_SizeUnit 

CHIP GridY 
Chip Description 
PRUNINGSET ID(FK) 
CHtPTYPE ID 
CHIPTYPE Name 
CDfldchiptypedescriptioa 
CHIP DesigalD(FK) 
CDfldChipDesignName 
CHIP Design rD( FX) 
CHIPCOMP ElementID 
TILING ID 
CHIP Design ED(FK) 
TILING TypeID(FK) 
TILINGTYPE ED 
TILINGTYPE Name 
TILINGTYPE DesType 
TILINGTYPE Set 
TILING TYPE ID(FK) 
SEQUENCE ID(FK) 
TILECOMP_AggregateID(FK) 

TILECOMP_ELEMENTID(FK) 

PROBEID 

PROBEROLE ED(FK) 
THING ID 
PROBE Sequence 



Reference to sequence item table. 
Reference to chip design table. 
Reference to reject type table. 
Reject type identifier. 
Name of reject type. 
Description of reject type. 
Sequence type identifier. 
Name of sequence type. 
Description of sequence type. 
Original sequence. 
Derived Sequence. 
Reference to derivation type table 
Suffix attached to name of derived 
sequence. 

Offset between original sequence and 
derived sequence. 
Derivation type identifier. 
Name of derivation type. 
Description of derivation type. 
Suffix associated with derivation type. 
First sequence compared. 
Second sequence compared 
Percentage match between compared 
sequences. 

Sequencing common between two 

compared sequences. 

Offset value if second compared 

sequences an offset from first compared 

sequence. 

Identifier of sequence included in 
aggregate. 

Identifier of aggregate of sequences. 

Pruning set identifier. 

Reference to sequence item table. 

Pruning set identifier. 

Name of pruning set. 

Description of pruning set 

Chip design identifier. 

Reference to company table. 

Reference to chip type table. 

X dimension size of chip features, e.g., 

25 or 50 ftm. 

Mask identifier associated with mask 
for chip 

Feature size and Y direction. 
Part number to identify chip. 
Another chip designator. 
Number of cells in the X direction. 
Units used for feature size, typically 
microns. 

Number of cells in the Y direction. 
Description of chip. 
Reference to pruning set table. 
Chiptype identifier. 
Name of chip type. 
Description of chip type. 
Reference to chip design table. 
Name of chip design. 
Identifier of chip set. 
Identifier of chip in chip set. 
TOing item identifier. 
Reference to chip design table. 
Reference to tiling type table. 
Tiling type identifier. 
Name of tiling type. 
Code for tiling type. 
Description of tiling type. 
Reference to tiling type table. 
Reference to sequence item table. 
Identifier for aggregation of tiling 
items. 

Identifier for tiling item within 

aggregation. 

Probe identifier. 

Reference to probe role table. 

Reference to tiling item table. 

Probe sequence. 
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-continued 



FIELD 



CONTENTS 



PROBE ROLE 

PROBE SPEC 

SENSE TYPE 

SEQUENCE USED 
CRITERIAN 

CRTTERIAN2 
CUSTOMER 
SITE 

COMPANY 
PROBE REQUEST 

OUTER LIMITS 
CDtblSourceMap 



PROJECT 

PROJECT MAP 
COtblDes ignRequest 



PROBESPEC ID(FK) 
PROBE X 
PROBE Y 
Number 

PROBEROLE ID 
PROBEROL_Name 

PROBEROLE DcsTypc 
PROBEROL_Control 

PROBESPEC ID 
SENSETYPE_ID(FK) ( AK1 . 3) 

PROBESPEC Lcngth(AKl.l) 
PROBESPEC__SubatPos ition 
(AK1.2) 

SENSETYPE ID 
SENSETYPE__Name 

SENSETYPE Description 
SENSETYPE_Sign 

SEQUENCE ID(FK) 
TILING ID(FK) 
EXCEPTION ID 
SEQUENCE ID(FK) 
EXCEPTIONTYPE ID(FK) 
TILING ID(FK) 
EXCEPTIONTYPE ID 
CRITERIUMTYPE ExlcnsioQ 
EXCEPTIONTYPE Name 
CRITERIUMTYPE Description 
CRITERIUM_Cluster 

CUSTOMER ID 
CUSTOMER SitcID(FK) 
CUSTOMER ContactName 
CUSTOMER PhoneNumber 
Cofdperscnemail 
COfldPersonLastName 
SITE ID 
SITE Address 
SITE PhoneNumber 
COMPANY ID(FK) 
COMPANY ID 
COMPANY Name 
PROBEREQ ID 
COMPANY ID(FK) 
PROBEREQ_Chip ID 

PROBEREQ_ProbcId 

COMPANY ID(FK) 

UMIT GcnesPerChip 

UMIT ProbeRequestPerChip 

SEQUENCE ID(FK) 

CUSTOMER ID(FK) 

CDfldSourceMapDateAcquired 

CDfldSourceMapAnnealing Temp 

CDfldSQurceMapConfidence 

CDfldSourceMapStartMaterial 

String 

PROJECT ID 

CUSTOMER ID(FK) 

PROJECT DateCreated 

PROJECT Description 

PROJECT ID(FK) 

GENOMIC ItemId(FK) 

COfldDesignRequestID 

CUSTOMER ID(FK) 

CHIP DesignlD(FK) 

COMMTYPE_IDCOfldDesignReq 

uestDatc Received 

COM MTYPE_Na me CO fld Design 

RequestPO 

CofldcomCOfldDcsignRcqucstGcn 
esPerChip 



Probe specification identifier. 

X position of probe on chip. 

Y position of probe on chip. 

Sequence position of probe 

Probe role identifier. 

Name of probe roll, e.g., perfect match 

or mismatch. 

Code representing probe roll name. 
Indicates whether probe is a control 
probe. 

Probe specification identifier. 
Sense type indication, e.g., sense or 
antisense; reference to sense type table. 
Length of probe. 

Position at which mismatch is made for 

a mismatch probe. 

Sense type identifier. 

Name of sense type, e.g., sense or 

antisense. 

Longer version of sense or antisense. 
Positive or negative, depending on 
whether sense or antisense. 
Reference to sequence item table. 
Reference to tiling item table. 
Exception identifier. 
Reference to sequence item table. 
Reference to exception type table. 
Reference to tiling item table. 
Exception type identifier. 
Suffix to identify criterium type. 
Name of criterium type. 
Description of criterium type, 
whether criterium type is part of a 
cluster. 

Customer identifier. 

Reference to site table. 

Name of customer contact. 

Phone number of customer contact. 

E-mail address of customer contact 

Last name of customer contact. 

Site identifier next row. 

Address of site. 

Phone number of site. 

Reference to company table. 

Company identifier. 

Name of company. 

Probe request identifier. 

Reference to company table. 

Chip that probe request is made for, 

reference to chip design table. 

Identifier of probe that was requested, 

reference to probe table. 

Reference to company table. 

Maximum number of genes per chip. 

Maximum number of probes per chip. 

Reference to sequence item table. 

Reference to customer table. 

Date source map acquired. 

Annealing temperature for sequence. 

Confidence level in sequence map. 

Pertains to method of creation of map. 

Comment. 

Project identifier. 

Reference to customer table. 

Date of project creation. 

Description of project. 

Reference to project table. 

Reference to genomic item table. 

Design request identifier. 

Customer identifier. 

Reference to chip design table. 

Date request received. 

Purchase order number. 

Number of genes per chip requested. 
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-continued 



FIELD 



CONTENTS 



DESIGN MAP 



COMMUNICATIONS 



COMM TYPE 



ITEM REQUESTED 



Classification 
CLASS MAP 



COfldDesignRequest 
ProbesPerGene 

COftdDes ignRequestFeatureSize 

COfldDcs ignRequestPeatureCount 
COfldDesignRequestDcscription 
COfldDes ignRequestlnstructions 
String 

PROJECT ID(FK) 
COfldDes ignRcquestlD(FK) 
COMM ID 

COfldDesignRequestlD(FK) 
COMMTYPE ID(FK)(IE1.1) 
COMM Date 
COMM Description 
COMMTYPE ID 
COMMTYPE Name 
Cofidcommtypedescription 
ITEM Rcqucstcdld 
COfldDes LgnRequestlD(FK) 
SEQUENCE ID(FK) 
ITEM_Start 

ITEM Stop 
ITEM Alias 
ITEM Description 
ITEM_Reverse 

import Qualifier 
Cofldltemrequestedprobeperitem 

Coflditemrcqucs tcdtilcreverse 

CLASSIFICATION ID 
CLASS Kcyword(AKl.l) 
ITEM RequestedID(FK) 
CLASSIFICATION ID(FK) 
CLASSMAP_Group 



Number of probes per gene requested. 

Feature size requested, e.g., 
25 or 50 /zm 

How many features will fit on chip. 
Description of requested chip. 
Customer instructions. 
Orientation of target sequences that are 
to be read with the chip. 
Reference to project table. 
Reference to design request table. 
Communications identifier. 
Reference to design request table. 
Reference to communication type tabic. 
Date of communication. 
Description of communication. 
Communication type identifier. 
Name of communication type. 
Description of communication type. 
Requested item identifier. 
Reference to design request table. 
Reference to sequence item table. 
Permissible starting point in submitted 
sequence. 

Permitted stopping point in sequence. 
Another name for specified sequence. 
Description of sequence. 
Whether sequence is to be reversed 
before placement on chip. 
Import qualifier?? 

Override to number of probes per gene 
in design request table. 
Whether particular sequence is to be 
tiled in sense or antisense direction. 
Classification identifier. 
Description of classification. 
Reference to item request table. 
Reference to classification table. 
Grouping together of classification 
specified by customer. 



It is understood that the examples and embodiments 4Q 
described herein are for illustrative purposes only and that 
various modifications or changes in light thereof will be 
suggested to persons skilled in the art and are to be included 
within the spirit and purview of this application and scope of 
the appended claims. For example, tables may be deleted, 
contents of multiple tables may be consolidated, or contents 45 
of one or more tables may be distributed among more tables 
than described herein to improve query speeds and/or to aid 
system maintenance. Also, the database architecture and 
data models described herein are not limited to biological 
applications but may be used in any application. All 5Q 
publications, patents, and patent applications cited herein are 
hereby incorporated by reference. 

What is claimed is: 

1. A computer-readable storage medium having stored 
thereon: 

a relational database comprising: 55 
a probe table including a plurality of probe records, 
each of said probe records specifying a polymer 
probe for use in one or more polymer probe arrays; 
a sequence item table including a plurality of sequence 
item records, each of said sequence item records 
specifying a nucleotide sequence to be investigated 
in said one or more polymer probe arrays; and 
wherein there is a many-to-many relationship between 
said probe records and said sequence item records 
and at least one sequence item record corresponds to 
more than one probe record and at least one probe 65 
record corresponds to more than one sequence item 
record. 



2. The medium of claim 1 wherein said relational database 
further comprises: 

a tiling item table including a plurality of tiling item 
records, each of said tiling item records having an 
aggregation relationship with said probe records so that 
each tiling item record has many associated probe 
records. 

3. The medium of claim 1 wherein said relational database 
further comprises: 

a genomic item table including a plurality of genomic 
item records, each of said genomic item records speci- 
fying a genomic item to be investigated by said one or 
more polymer probe arrays; and 

wherein there is a many to many relationship between 
genomic item records and sequence item records. 

4. The medium of claim 1 wherein said relational database 
further comprises: 

a chip design table including a plurality of chip design 
records, each of said chip design records specifying a 
design of a chip including a subset of said plurality of 
probe records. 

5. A computer implemented method for operating a rela- 
tional database comprising: 

creating a probe table including a plurality of probe 
records, each of said probe records specifying a poly- 
mer probe for use in one or more polymer probe arrays; 

creating a sequence item table including a plurality of 
sequence item records, each of said sequence item 
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records specifying a nucleotide sequence to be inves- 
tigated in said one or more polymer probe arrays; 
storing data in said probe table and said sequence item 
table; and wherein there is a many-to-many relationship 
between said probe records and said sequence item 5 
records and at least one sequence item record corre- 
sponds to more than one probe record and at least one 
probe record corresponds to more than one sequence 
item record. 

6. The method of claim 5 further comprising the step of: 10 
creating a tiling item table including a plurality of tiling 

item records, each of said tiling item records having an 
aggregation relationship with said probe records so that 
each tiling item record has many associated probe 
records. 15 

7. The method of claim 5 further comprising the step of: 
creating a genomic item table including a plurality of 

genomic item records, each of said genomic item 
records specifying a genomic item to be investigated by 2Q 
said one or more polymer probe arrays; and 
wherein there is a many to many relationship between 
genomic item records and sequence item records. 

8. The method of claim 5 further comprising the step of: 
creating a chip design table including a plurality of chip 25 

design records, each of said chip design records sped- 
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fying a design of a chip including a subset of said 

plurality of probe records. 
9. A computer system comprising: 
a processor; and 

a storage medium storing a relational database accessible 
by said processor, said storage medium having stored 
thereon: 

a relational database comprising: 

a probe table including a plurality of probe records, 
each of said probe records specifying a polymer 
probe for use in one or more polymer probe 
arrays; 

a sequence item table including a plurality of 
sequence item records, each of said sequence item 
records specifying a nucleotide sequence to be 
investigated in said one or more polymer probe 
arrays; and wherein there is a many-to-many rela- 
tionship between said probe records and at least 
one sequence item record corresponds to more 
than one probe record and at least one probe 
record corresponds to more than one sequence 
item record. 

***** 
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